Skip to content

Conversation

@eshwarprasadS
Copy link
Contributor

@eshwarprasadS eshwarprasadS commented Apr 30, 2025

Adding LongBench to eval options,

Install extras with:

pip install instructlab-eval[longbench]

Uses VLLM backend for serving the model for generation

Runs like so:

evaluator = LongBenchEvaluator(
    model_path="path/to/model",
    num_gpus=N,
    output_file="path/to/results.json",
    eval_config={"batch_size": "auto"},
    vllm_config={"max_model_len": max_len}
)

results = evaluator.run()  # Returns LongBenchResult

Output json looks like so:

{
  "en_multidoc": 0.5424139838230786,
  "zh_multidoc": 0.24335639081098673,
  "en_singledoc": 0.4233139199560039,
  "zh_singledoc": 0.46157875457875464,
  "en_summ": 0.27244809337990245,
  "zh_summ": 0.1359562304911904,
  "en_fewshot": 0.45692449627485754,
  "zh_fewshot": 0.24416666666666667,
  "en_synthetic": 0.3799285714285714,
  "zh_synthetic": 0.4775,
  "code_avg": 0.30225,
  "overall_score": 0.3581670097645466
}

@mergify mergify bot added dependencies Pull requests that update a dependency file ci-failure labels Apr 30, 2025
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @eshwarprasadS !

The PR has all of the right ideas, there are just a few minor changes that you'll want to make which I've outlined in this review. Once we've addressed those, this should be good to merge

) / 2

# Calculate overall score
all_scores = [v for k, v in eval_results.items() if k != "overall_score"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check if k != "overall_score"? We shouldn't have set this key yet

@mergify mergify bot added ci-failure and removed ci-failure labels May 1, 2025
@mergify mergify bot added ci-failure testing Relates to testing and removed ci-failure labels May 9, 2025
@mergify mergify bot removed the ci-failure label May 25, 2025
@RobotSail
Copy link
Member

@eshwarprasadS It looks like you may need to rebase your changes

@mergify mergify bot added ci-failure and removed ci-failure labels May 25, 2025
@mergify mergify bot added ci-failure and removed ci-failure labels Jun 2, 2025
@RobotSail
Copy link
Member

@mergify rebase

@mergify
Copy link
Contributor

mergify bot commented Jun 2, 2025

rebase

✅ Branch has been successfully rebased

@mergify
Copy link
Contributor

mergify bot commented Jun 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @eshwarprasadS please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 2, 2025
@RobotSail
Copy link
Member

@eshwarprasadS It looks like you have a few merge conflicts that need to be fixed. Once those are solved, we can merge this.

Signed-off-by: Eshwar Prasad Sivaramakrishnan <[email protected]>
@mergify mergify bot removed the needs-rebase label Jun 6, 2025
@RobotSail RobotSail merged commit a9876d8 into instructlab:main Jun 17, 2025
16 checks passed
@mergify mergify bot added the one-approval label Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file one-approval testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants